{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "168721af",
   "metadata": {},
   "source": [
    "# Use third party language models with ArcGIS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "371479f3",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9efc4d60",
   "metadata": {},
   "source": [
    "The Text Analysis toolset in the GeoAI toolbox provides a set of tools for text processing tasks, such as text classification, entity extraction, and text translation. The natural language processing (NLP) models that are created using these tools are built on language models such as BERT, RoBERTa, and T5, and large language models (LLMs), such as Mistral, to ensure high-performance text analysis.\n",
    "\n",
    "While these prebuilt models are robust, you may need a custom NLP workflow, such as when using an LLM for translating text, performing sentiment analysis, or extracting custom entities or relationships not currently supported by the Text Analysis toolset. \n",
    "\n",
    "To meet these needs, you can integrate ArcGIS with external third-party language models. This includes open source LLMs as well as cloud-hosted, commercial LLMs accessed using a web API. Keep in mind that if you are using a web hosted LLM, the data that you are processing will be sent to the LLM provider for processing. Python developers can author a custom NLP function to integrate with external models and package their model as an Esri deep learning package (.dlpk file) for use with the following tools:\n",
    "\n",
    "1. [Classify Text Using Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/classify-text-using-deep-learning.htm)\n",
    "2. [Extract Entities Using Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/extract-entities-using-deep-learning.htm)\n",
    "3. [Transform Text Using Deep Learning](https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/transform-text-using-deep-learning.htm)\n",
    "\n",
    "\n",
    "By following the steps outlined in this documentation, you will be able to create and use custom .dlpk files effectively, enhancing your text analysis tasks with models tailored to your needs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4941d9fb",
   "metadata": {},
   "source": [
    "## Custom Python NLP Function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bf800ce",
   "metadata": {},
   "source": [
    "You can create an NLP function in Python to integrate third-party language models into a text processing pipeline. NLP functions handle text data and perform various text processing tasks."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6973c6c7",
   "metadata": {},
   "source": [
    "### Anatomy of a Python NLP Function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb0c423a",
   "metadata": {},
   "source": [
    "#### `__init__`\n",
    "\n",
    "The `__init__` method initialize instance variables such as name, description, and other attributes essential for the NLP function.\n",
    "\n",
    "#### `initialize`\n",
    "\n",
    "The `initialize` method is used to load the NLP model and set up any initial configurations. Use this method at the start of the Python NLP function.\n",
    "\n",
    "#### `getParameterInfo`\n",
    "\n",
    "The `getParameterInfo` method specifies the parameters that the NLP function accepts. This includes any configuration settings required to load or connect to the model as well as parameters needed for text processing.\n",
    "\n",
    "#### `getConfiguration`\n",
    "\n",
    "The `getConfiguration` method describe how the function will perform input processing and generate outputs. It includes details for any preprocessing or postprocessing steps necessary for the function.\n",
    "\n",
    "#### `predict`\n",
    "\n",
    "The `predict` method responsible for converting input text to the desired output. It uses defined parameters and processing logic to produce the final result.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "014a6dc9",
   "metadata": {},
   "source": [
    "### In-Depth Method Overview\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17735a1e",
   "metadata": {},
   "source": [
    "#### `__init__`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e30241ac",
   "metadata": {},
   "source": [
    "The `__init__` method is the constructor of the custom NLP function and initializes instance variables such as the name, description, and other essential attributes. This method sets the properties that will define the behavior and characteristics of the NLP function.\n",
    "\n",
    "The constructor creates an instance of a custom NLP class with all the attributes required for processing and analysis. When creating an instance of an NLP class, this method ensures that it has the required settings and default values. For example, if the NLP function needs specific settings such as paths to models, or special tokens, they can be set up in this method."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80b4742c",
   "metadata": {},
   "source": [
    "\n",
    "```python\n",
    "class MyTextClassifier:\n",
    "    def __init__(self, **kwargs):\n",
    "        \"\"\"\n",
    "        It sets up the initial state of an object by defining its attributes,\n",
    "        such as name, description, and other properties.\n",
    "\n",
    "        \"\"\"\n",
    "        self.name = \"Text classifier\"\n",
    "        self.description = '''The `MyTextClassifier` class is designed to perform text classification tasks\n",
    "                            such as categorizing text data into predefined categories.'''\n",
    "        # Additional initialization code here\n",
    "        ...\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9958400",
   "metadata": {},
   "source": [
    "#### `initialize`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba02b6f9",
   "metadata": {},
   "source": [
    "The initialize method is called at the start of the custom Python NLP function and to this method, the kwargs['model'] is passed. The kwargs['model'] argument is the path to the Esri model definition file (.emd). The method should be used to load the model weights to set up the NLP model, ensuring a reference to it for subsequent operations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8095c1f5",
   "metadata": {},
   "source": [
    "```python\n",
    "def initialize(self, **kwargs):\n",
    "    \"\"\"\n",
    "    Initialize model parameters, such as loading pretrained model weights.\n",
    "    \"\"\"\n",
    "    json_file = kwargs['model']\n",
    "    with open(json_file, 'r') as f:\n",
    "        self.json_info = json.load(f)\n",
    "    \n",
    "    # access the model path in the model definition file\n",
    "    model_path = json_info['ModelFile']\n",
    "    # load your model and keep an instance for the model\n",
    "    self.model = load_your_model(model_path)\n",
    "    \n",
    "    # Additional initialization code here\n",
    "    ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11e3da28",
   "metadata": {},
   "source": [
    "#### `getParameterInfo`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23337791",
   "metadata": {},
   "source": [
    "The getParameterInfo method is called by the Text Analysis tools after the initialize method and is where the parameters needed by the model are defined. This method returns a list of input parameters expected by the custom NLP function. Each parameter is described using a dictionary containing the name, data type, display name, and description of the parameter, and a Boolean parameter indicating whether the parameter is required, as shown below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a0b943a",
   "metadata": {},
   "source": [
    "```python\n",
    "def getParameterInfo(self):\n",
    "    return [\n",
    "        {\n",
    "            \"name\": \"class_names\",\n",
    "            \"dataType\": \"string\",\n",
    "            \"required\": True,\n",
    "            \"displayName\": \"class_names\",\n",
    "            \"description\": \"Comma-separated list of class names used for classification.\",\n",
    "            \"value\": \"positive,negative\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"prompt\",\n",
    "            \"dataType\": \"string\",\n",
    "            \"required\": False,\n",
    "            \"displayName\": \"prompt\",\n",
    "            \"description\": \"The number of samples processed in one forward pass through the model.\",\n",
    "            \"value\": \"Classify the following text into the defined classes.\"\n",
    "        },\n",
    "        # Additional code here\n",
    "        ...\n",
    "    ]\n",
    "    \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e5ff87a",
   "metadata": {},
   "source": [
    "**Returns**: The method returns a list of dictionaries, each describing a parameter.\n",
    "\n",
    "Key attributes of the dictionary are as follows:\n",
    "\n",
    "- **`name`**: A string identifier for the parameter\n",
    "- **`dataType`**: The type of data the parameter accepts, such as a `string`, `Boolean`, or `list`\n",
    "- **`value`**: The default value for the parameter\n",
    "- **`required`**: A Boolean indicating whether the parameter is required\n",
    "- **`displayName`**: A user-friendly name for the parameter\n",
    "- **`domain`**: A set of allowed values, if applicable\n",
    "- **`description`**: A detailed description of the parameter\n",
    "\n",
    "The list of parameters is displayed to the user through the custom model's model arguments in the Text Analysis tools. Users of the model can set these values interactively using the tool user interface or programmatically pass them into the getConfiguration method as keyword arguments."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6d674aa",
   "metadata": {},
   "source": [
    "#### `getConfiguration`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19ddcbdc",
   "metadata": {},
   "source": [
    "The `getConfiguration` method sets up and manages the parameters for the NLP function. It is passed keyword arguments containing the parameters updated by users of the model through the tool or provided programmatically. The method also controls how the function processes and outputs data based on the updated parameters. This method is invoked after the getParameterInfo method but before the predict method. The return value of the function is a dictionary containing value of the `batch_size` indicating how many strings the model can process at a time. The return value of the method informs the tool how the input data needs to be split for processing by the model one batch at a time.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87e5433e",
   "metadata": {},
   "source": [
    "```python\n",
    "def getConfiguration(self, **kwargs):\n",
    "    \"\"\"\n",
    "    This method configures the supported NLP function parameters and\n",
    "    controls further processing based on them.\n",
    "    \"\"\"\n",
    "    # Set the class names from the provided arguments\n",
    "    self.class_names = kwargs.get(\"class_names\", \"\")\n",
    "    self.prompt = kwargs.get(\"prompt\", \"\")\n",
    "    # Set the batch size, limiting it to a maximum of 4\n",
    "    if kwargs.get(\"batch_size\", 0) > 4:\n",
    "        kwargs[\"batch_size\"] = 4\n",
    "       \n",
    "    # Additional code here\n",
    "    ...\n",
    "    \n",
    "    # Return the updated parameter values\n",
    "    return kwargs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30cd88e7",
   "metadata": {},
   "source": [
    "In the example above, the custom NLP function is configured by doing the following:\n",
    "\n",
    "- Setting the `class_names` parameter from the provided arguments.\n",
    "- Setting the `prompt` parameter from the provided arguments.\n",
    "- Limiting the `batch_size` parameter to a maximum of 4 if a larger value is provided.\n",
    "- Returning the updated parameter values."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6798934",
   "metadata": {},
   "source": [
    "#### `predict`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a46a855b",
   "metadata": {},
   "source": [
    "The `predict` method performs inference, that is, it generates predictions with the NLP model. This method is passed a FeatureSet containing the input features (or rows in the case of a table) and kwargs containing the field name which contains the input strings. This method returns the results in the form of a FeatureSet object. The following is a typical workflow:\n",
    "\n",
    "- Extract the input text to be processed from the provided FeatureSet and preprocess it to match the model’s requirements.\n",
    "- Apply the NLP model to the preprocessed text to generate predictions.\n",
    "- Refine or format the model’s predictions as needed.\n",
    "- Package the processed predictions into a FeatureSet object and return it.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "820b2c3e",
   "metadata": {},
   "source": [
    "```python\n",
    "def predict(self, feature_set: FeatureSet, **kwargs):\n",
    "    \"\"\"\n",
    "    Predicts the classification of text data from the input FeatureSet and returns a new FeatureSet with the predictions.\n",
    "    \"\"\"\n",
    "    # read the text from the input Featureset, when calling this function from ArcGIS Pro tools, be sure to use the name of the column that contains the text data instead of `input_str`.\n",
    "    field = kwargs[\"input_field\"]\n",
    "    input_text = feature_set.df[field].to_list() \n",
    "    # Preprocessing input code here\n",
    "    # \n",
    "    ... \n",
    "    \n",
    "    # Make Predictions\n",
    "    results = self.model.predict(input_text, self.class_names, self.prompt)\n",
    "    \n",
    "    # Additional code here\n",
    "    ... \n",
    "    \n",
    "    # Create featureset\n",
    "    feature_dict = {\n",
    "        \"fields\": [\n",
    "            {\"name\": \"input_str\", \"type\": \"esriFieldTypeString\"},\n",
    "            {\"name\": \"class\", \"type\": \"esriFieldTypeString\"}\n",
    "        ],\n",
    "        'geometryType': \"\",\n",
    "        'features': [{'attributes': {'input_str': inputs, \"class\": result}}]\n",
    "    }\n",
    "    \n",
    "    # Return the featureset\n",
    "    return FeatureSet.from_dict(feature_dict)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62a2a40b",
   "metadata": {},
   "source": [
    "## Esri Model Definition (.emd) File"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b68e249e",
   "metadata": {},
   "source": [
    "After creating a custom Python NLP function, include a reference to the function in the `.emd` file by specifying it next to the InferenceFunction parameter. This ensures that the `.emd` file correctly links to the function, enabling it to be used in an NLP processing pipeline."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81615ba8",
   "metadata": {},
   "source": [
    "```json\n",
    "{\n",
    "    \"InferenceFunction\": \"MyTextClassifier.py\",\n",
    "    \"ModelType\": \"TextClassifier\",\n",
    "    \"OutputField\": \"ClassLabel\",\n",
    "    \"Labels\": [\n",
    "        \"positive\",\n",
    "        \"negative\"\n",
    "    ],\n",
    "\n",
    "\n",
    "    # additional keys here\n",
    "    ...\n",
    "}\n",
    "```\n",
    "\n",
    "Note:\n",
    "The .emd file must include the following keys:\n",
    "\n",
    "- InferenceFunction — Specify the name of the file containing the custom NLP function.\n",
    "- ModelType — Indicate the type of model based on its task. Supported values are TextClassifier, SequenceToSequence, and EntityRecognizer.\n",
    "- OutputField — Provide the name of the output field that will store the results for TextClassifier or SequenceToSequence models."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5490415",
   "metadata": {},
   "source": [
    "## Custom Deep Learning Package (.dlpk) Model File\n",
    "\n",
    "To complete a custom NLP setup, create a .dlpk model file. The .dlpk file allows you to use a model with the with the `arcgis.learn` Python API and the inference tools in the Text Analysis toolset.\n",
    "\n",
    "Organize the files as follows:\n",
    "\n",
    "1. **Create a Folder:** Create a folder and include the custom NLP function file (for example, MyTextClassifier.py) and the Esri .emd file (for example, TextClassifier.emd). The name of the folder must match the name of the .emd file.\n",
    "   Example folder structure:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed5e4a77",
   "metadata": {},
   "source": [
    "```\n",
    "TextClassifier/\n",
    "├── MyTextClassifier.py\n",
    "└── TextClassifier.emd\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de30fb69",
   "metadata": {},
   "source": [
    "Include any additional files or folders as necessary for the NLP function.\n",
    "\n",
    "2. **Zip the Folder:** Compress the folder into a ZIP archive. Rename the .zip file to match the name of the .emd file but with the .dlpk extension. Example final file name:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9dba74d",
   "metadata": {},
   "source": [
    "```\n",
    "TextClassifier.dlpk\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24501dd6",
   "metadata": {},
   "source": [
    "\n",
    "This `.dlpk` file is now ready for use with the `arcgis.learn` Python API and can be incorporated into the text analysis tools inside `ArcGIS Pro`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e89402c",
   "metadata": {},
   "source": [
    "## Using the Custom .dlpk with the arcgis.learn API"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da397fa7",
   "metadata": {},
   "source": [
    "The `arcgis.learn` Python API supports for third party custom models through various classes, including `TextClassifier`, `EntityRecognizer`, and `SequenceToSequence`. To utilize your custom `.dlpk` file for inference with these classes, follow the steps below. The example provided demonstrates how to use the `TextClassifier` class, but you can apply similar steps for the other supported classes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53afac4d",
   "metadata": {},
   "source": [
    "### Example with TextClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26a42451",
   "metadata": {},
   "source": [
    "```python\n",
    "from arcgis.learn.text import TextClassifier\n",
    "\n",
    "model = TextClassifier.from_model(\n",
    "\"path_to_your_custom_dlpk_file\"\n",
    ")\n",
    "\n",
    "results = model.predict(\"Input String or list\")\n",
    "\n",
    "results_df = results.df\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b57d9cf",
   "metadata": {},
   "source": [
    "\n",
    "1. **Import the TextClassifier Class:**\n",
    "   Begin by importing the `TextClassifier` class from the `arcgis.learn.text` module.\n",
    "   \n",
    "   \n",
    "\n",
    "2. **Load the Custom Model:**\n",
    "   Use the `from_model` method to load your custom `.dlpk` file. Replace `\"path_to_your_custom_dlpk_file\"` with the actual path to your `.dlpk` file. You can also define additional keyword arguments as needed.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "3. **Perform Inference:**\n",
    "Use the predict method of the TextClassifier class to make predictions. You can pass a single input string or a list of strings.\n",
    "\n",
    "4. **Extract Results:**\n",
    "Convert the results to a DataFrame and extract the information as needed.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c05c73fc",
   "metadata": {},
   "source": [
    "## Utilizing Custom Models with the Text Analysis Toolset in ArcGIS Pro\n",
    "\n",
    "The Text Analysis Toolset, part of the GeoAI toolbox in ArcGIS Pro, supports model extensibility through a variety of advanced tools. These tools include:\n",
    "\n",
    "- **Classify Text Using Deep Learning**: For categorizing text into predefined classes.\n",
    "- **Extract Entities Using Deep Learning**: For identifying and extracting specific entities from text.\n",
    "- **Transform Text Using Deep Learning**: For applying deep learning models to transform text data.\n",
    "\n",
    "To leverage your custom `.dlpk` file for inference with these tools, follow the steps outlined below. Although the example provided focuses on using the **Classify Text Using Deep Learning** tool, the process is similar for other supported tools such as **Extract Entities Using Deep Learning** and **Transform Text Using Deep Learning**."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67b0d230",
   "metadata": {},
   "source": [
    "### Example: Using Custom `.dlpk` File with the Classify Text Using Deep Learning Tool"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b6218c8",
   "metadata": {},
   "source": [
    "To utilize the **Classify Text Using Deep Learning** tool effectively, follow these steps:\n",
    "\n",
    "1. **Locate the Tool**: \n",
    "   Find the **Classify Text Using Deep Learning** tool within the **GeoAI Toolbox**. This tool is located under the **Text Analysis Toolset**. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85c44539",
   "metadata": {},
   "source": [
    "![GeoAI Toolbox Navigation]()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289bd145",
   "metadata": {},
   "source": [
    "2. **Load Your Custom Model**:\n",
    "   In the tool's interface, locate the **Input Model Definition File** parameter and load your custom `.dlpk` model file. For example, if your model file is named `MyTextClassifier.dlpk`, select it. \n",
    "\n",
    "\n",
    "Complete the other mandatory parameters as needed. For further details, refer to the [official documentation](https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/classify-text-using-deep-learning.htm).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92aab24a",
   "metadata": {},
   "source": [
    "![Input Model parameter File]()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf3a9b33",
   "metadata": {},
   "source": [
    "3. **Configure Model Arguments**:\n",
    "   **Model Arguments** will have the parameters you have defined in the `getParameterInfo()` method of your model. Update these parameters as needed for your model to function correctly. In the example I have given `class_names` and `prompt` parameters.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b79099fc",
   "metadata": {},
   "source": [
    "![Model Arguments]()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "468ae0f9",
   "metadata": {},
   "source": [
    "4. **Execute the Tool**:\n",
    "   After filling in all required parameters, click the **Run** button. This will process the input data and add a new column with the predictions to your input table.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35286ae8",
   "metadata": {},
   "source": [
    "By following these steps, you can effectively use the **Classify Text Using Deep Learning** tool with your custom model in ArcGIS Pro."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65ffb41a",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "arcgis.learn API and ArcGIS Pro's Text Analysis Toolset, part of the GeoAI Toolbox, enables model extensibility, allowing users to use custom Natural Language Processing (NLP) models into its text analysis workflows. This documentation provides a comprehensive guide on how to create, use custom .dlpk files with these tools and API."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
